On Computing Condensed Frequent Pattern Bases

نویسندگان

  • Jian Pei
  • Guozhu Dong
  • Wei Zou
  • Jiawei Han
چکیده

Frequent pattern mining has been studied extensively. However, the effectiveness and efficiency of this mining is often limited, since the number of frequent patterns generated is often too large. In many applications it is sufficient to generate and examine only frequent patterns with support frequency in close-enough approximation instead of in full precision. Such a compact but close-enough frequent pattern base is called a condensed frequent patterns-base. In this paper, we propose and examine several alternatives at the design, representation, and implementation of such condensed frequent pattern-bases. A few algorithms for computing such pattern-bases are proposed. Their effectiveness at pattern compression and their efficient computation methods are investigated. A systematic performance study is conducted on different kinds of databases, which demonstrates the effectiveness and efficiency of our approach at handling frequent pattern mining in large databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient One-pass Method for Discovering Bases of Recently Frequent Episodes over Online Data Streams

The knowledge embedded in an online data stream is likely to change over time due to the dynamic evolution of the stream. Consequently, in frequent episode mining over an online stream, frequent episodes should be adaptively extracted from recently generated stream segments instead of the whole stream. However, almost all existing frequent episode mining approaches find episodes frequently occu...

متن کامل

A new algorithm for computing SAGBI bases up to an arbitrary degree

We present a new algorithm for computing a SAGBI basis up to an arbitrary degree for a subalgebra generated by a set of homogeneous polynomials. Our idea is based on linear algebra methods which cause a low level of complexity and computational cost. We then use it to solve the membership problem in subalgebras.

متن کامل

Separating Structure from Interestingness

Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented. In this paper we propose a general approach to build condensed repr...

متن کامل

Efficient Frequent Pattern Mining Based on a Condensed Tree Structure

In this paper, we present an efficient tree structure and its associated algorithm for discovery of frequent patterns from a large data set. We demonstrate the effectiveness of our algorithm and performance improvement over the existing approach CATS which is one of the fastest frequent pattern mining algorithms known to date.

متن کامل

Représentation condensée en présence de valeurs manquantes

Missing values are an old problem that is very common in real data bases. We describe the damages caused by missing values on condensed representations of patterns extracted from large data bases. This is important because condensed representations are very useful to increase the efficiency of the extraction and enable new uses of patterns (e.g., rules with minimal body, clustering, classificat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002